On the random nature of 
(prime) number distribution 

"'''Erika L. Alvarez^, ^Jean Pestieau^ 
"Institute de Fisica, Universidad Nacional Autonoma de Mexico, 

Apartado postal 20364, 01000 Mexico D.F., Mexico 
''Institut de Physique Theorique, Universite catholique de Louvain, 
Chemin du Cyclotron 2, B-1348 Louvain-la-Neuve, Belgique 

Preliminary version, 14/12/2004 
Abstract 

Let 71 (x) denote the number of primes smaller or equal to x. We compare A/vr(x) with 
\/R{x) and y/£i{x), where R{x) and £i{x) are the Riemann function and the logarithmic 
integral, respectively. We show a regularity in the distribution of the natural numbers in 
terms of a phase related to (a/tF — V^) and indicate how ii(x) can cross 7r(x) for the first 
time. 

1 Introduction 
1.1 Preliminaries 

The function 7t{x) is the function counting the number of primes smaller or equal to x. 
For example, 7r(2) = 1, 7r(3) = 2, 7r(4) = 2, 7r(5) = 3, ... In 1792, when he was 15 years 
old. Gauss proposed 

X 

\nx 

as an approximation to 7i{x), which he refined afterwards P] to 



£i{x) = PV 



^ dt 
Int 



where PV means the integral principal value. The function li{x) can also be written as 
(,i[x) = J^dt/lnt, with = 1.4513692348 . . . 

Later, Riemann |2] improved the approximation with his Riemann function R{x) defined 

as 



oo 



n=l ^ 
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where ii is the Mobius function [3], given by 



if n has one or more primes repeated 
yu(?7,) =1 if n = 1 

(—1)'^ if n is a product of k different primes 

Riemann also proposed that jl] 

7r{x) - R{x) = -Y.Rix") (1) 

p 

where p are the trivial and non trivial zeroes of the Riemann zeta function, (, which is 
defined as 

oo 1 

c(^) = E ^ 
fe=i 

for 3?(s) > 1. Although Riemann did the analytical continuation of ( to all the complex 
plane excepting the point s = 1, an easier expression is given by [S] 

1 °° 1 " ^ n\ 

The trivial zeroes of ( are found easily from the relation jH] 

C(l-s) = 2(2vr)-^cos (f ) r(.)C(s) 

because when s = 2n + 1, with n an integer, ({—2n) = 0. 

With respect to the non trivial zeroes, the Riemann hypothesis [2] says that all of them 
lie on the "critical" line, p{t) = 1/2 + it. It is one of the most important problems of 
mathematics today. 

The prime number theorem, proved independently by de la Vallee-Poussin [Zj and 
Hadamard [S], assures that 

7r(x) 7i(x) 7i(x)\tlx 
lim — -— = lim — -— = lim = 1 

2;^oo ii{x) R{x) X 

Currently n^x) has been computed up to x ~ 10^^. All the computed values of n(x) 
today satisfy the inequality ii{x) > n{x). However, in 1914 Littlewood jH] showed that 
this inequality changes its sign infinitely often for very large x [TT] . 



1.2 Motivation 

In general the absolute value of the difference between the function tt{x) and its approx- 
imations, ii{x) or R{x), although it is smaller than ~ ^/7^{x), is a number much greater 
than the unity for large x. However, the absolute value of the difference between the 
square roots of tt{x) and of ii{x) or between the square roots of 7r(x) and of R{x) are 
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smaller than 1. Then these ones are what we will consider in order to have a better scope 
of the approximations to ■k(x). In Figure ^a, it is shown the difference y/Tr(x) — ^/R{x) 
and the maximal difference between these functions is \/tt{2) — \fR{2) = —0.244906 when 
X = 2. We see that \fR{x) averages very well ^/^T{x). In Figure [Hb it is shown the dif- 
ference \/Ji{x) — ^/^^{x), whose maximal height corresponds to the point x = 28, where 
^/li{28) - v/vr(28) = 0.525426. The gross line represents the function \/Ji{x) — \fR{x), 
which is the "average" of the points \/Ji — ^. In both figures not all the points are shown, 
there is a higher density in the center, a lot of external points are included to make the 
border explicit. The points were calculated with Mathematica until 10^^ and the rest were 
taken from the tables of ^0], which give values of tt{x) for numbers with three or four 
significant digits, and so, the points shown in the border after 10^^ are not necessarily the 
points with the biggest difference | a/vt — a/R | . 

In section 2, our plan is to delimit the function —\fR) from above and below with a 
tight function, in such a way that all the points remain inside the bounds, then, to delimit 
the functions {\/Ji — a/tt) and {H — tt), and finally to discuss the statistical distribution of 
a phase defined in terms of the functions previously mentioned. 



2 Discussion 



2.1 y^-VR 

One can study the general characteristics of the function \/tt{x) — \fR{x). The absolute 
value of this function is bounded with its maximal value | \/t^{2) — \fR{2) |= 0.244906. 
So, we can propose that \/ti{x) is given by 



(^) y7r(x) = V-R(x) + ?7(x) cos5(x) ?7(x) > (2) 

where ri{x) is the envelope, and all the points of Figure Ha are delimited by this one. 
Other parameterization is 

(m) a{x) = y/R{x) + r]{x)e'^^'''^ r/(x) > 0, | a(x) 1^= 7r(x) (3) 

this last one puts in evidence the parameterization in terms of an amplitude ri{x) and a 
phase 5{x). Equation Q implies 

7r(x) = R{x) + 2rj{x) cos 5\fR{x) + rf{x) (4) 

Observe that, when 5{x) = or vr. Equations Q and coincide. The first proposal for 
ri{x) is the function 

0.2595 
= lnln(x+15.9) 

However, from the work of we know that the first zero of the function \/£i{x) — y^{x) 
happens before x = 1.3982 x 10^^^, and may be much earlier. The function of Equation ^ 
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crosses x axis around x = 10^^. A function that crosses x axis around x = 1.3982 x 10"^^^, 

0.315647 

'''^'^^ [ln(x + 4.07206)]0-™2 ^"^^ 

If ^/£i{x) — ^/^T(x) crossed the axis before, ri(x) would be a function between the ones defined 
in Equation (jSJ and Equation ®. In Figure El it is shown the points y/7r{x) — \/R{x) with 
the two bounds and in Figure El it is shown the points ^/ii{x) — ^/7^{x), with its "average" 
function \/ii{x) — \/R{x), where the borders are given by 



2.2 £i-7T 



We can delimit ii — it from above and below. 

From Equation ^ and Equation (j3)) and using ii — 7T = £i — R + R — n one has that 

H-{^+7]f <ti--K <H-{^/R-7]f (7) 



Using the fact that in the limit of large x, ti{x) — R{x) y^/{\nx), vi?~ yx/lnx and 
that 7]^ is negligible, one has 



X X „ \/X / X 

27] J- <h-n < ^ + 27]^ 



Inx V Inx Inx V Inx 

and then, if there are values where ii{x) is smaller than 7i{x), then 7]{x) must decrease in 
a slower way than 1/(2 vmx), as it happens with Equation Q and Equation 

In Figures mb andlUc it is shown ii{x) — 7r(x) using for their bounds Equation (jH)), 
with r]{x) given by Equation ^ and Equation (jH)). The bounds of Equation (jH)) only work 
for large x, when R{x) ~ £i{x) — {l/2)ii{x^^'^). For small x, Equation (jHl) is not valid, and 
we use directly the bounds (|7|), and in Figure |31a we show the later ones in the interval 
X G (2, lO'^). The gross line corresponds to the "average" function {ii{x) — R{x)). 

2.3 cos^ 

With a sample of the first natural numbers one averages the functions ^ — \fR and -k — R. 
The values of Table Q are obtained for different sample sizes. In this table, cr(/) is the 



standard deviation, a = \l {P) — (/)^, with / equal to {^/ti — y/R) or to (vr — R). We 
see that (v/tt-ZR) is a small number bigger than zero and has a small variation in the 
different intervals. 

Working out the value of cos 5 in both cases. Equations (j21) and (jH), one has 

^/^T(x) - -s/RIx) - nix) - Rix) - 7fix) 

cos = y cos d = r= 

V{x) 2/R(x)r/(x) 
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respectively and taking the first definition of ri{x) = 77i(x), Equation (0), one has the 
averages of Table |21 in the intervals x G (2, 100),. . . , x G (2, 10^). 

The results of Table 121 show that the averages remain approximately constant. With 
respect to the width of a of the distribution, as to the average of cos 6 absolute value, 
the difference in the parameterizations of Equation Q and Equation ^ is negligible. 
Also, although for the first intervals the difference in the average (cos 5) is bigger, as x 
grows the averages in the two parameterizations get closer, because in general the ratio 
?7^/|7r — i?| ^ 1. From now on, we will keep the parameterization of Equation (j2I). 

Taking the other proposal of ri{x) = 772(2;), Equation (jHl), the averages of Table El are 
found. In this table, the average value of cos 5 is not very different from the previous 
parameterization, being consistent with a small positive number. 

In order to see the weight of the different sets of numbers with respect to cos 5, in Table 
|3]we give the average of cos 5 for natural, prime, even and odd (without primes) numbers. 
We see that as x grows, the prime distribution, which has a higher cos 5 average, has a 
smaller weight, because the ratio of prime to natural numbers decreases approximately as 
'7i{x)/x ~ l/lnx. So, the average of cos 5 for the even and odd natural numbers will be 
approximately the same for large x. 

Let us take ri{x) given by Equation (0): if we divide cos 5 in the intervals 
(-1, -0.95), (-0.95, -0.85), . . . , (0.85, 0.95), (0.95, 1), we find distributions of TableEl They 
give the number of positive integers whose cos (5 falls in one of these intervals, we count 
them in 4 different sample sizes: (2, 10^), (2, 10^), (2, 10^) and (2, 10^). 

In Figures |S1 and IHl it is shown distributions of cos 6 as explained in the previous para- 
graph. We have normalized them to have the total area of the bars equal to one. For 
example, for the natural numbers between (2, 10'^), there are 47 numbers whose cos 5 falls 
in the interval (—0.45, —0.35). We divide these 47 numbers by the sample total number, 
999, to obtain the relative frequency and multiply by 10, because the size of each interval 
is 0.1 (except for the intervals (—1, —0.95) and (0.95, 1)). 

The distribution is gaussian, and from Table 2, the width appears to have the same 
value, a = 0.28, it does not matter the number of positive integers with which we take the 
average. The average seems to stabilize around (cos 5) = 0.014. In all the figures. Figures 
El and El we used the same Gaussian with width a = 0.28, average (cos 5) = 0.014 and 
height l/{\/2Tra) = 1.425, and the fit of the Gaussian is in a very good agreement with the 
data. 

Finally, from Equation (0) and Equation (|H) 

7T — R ^ 2VRt] cos (5 
then (vr(x) - R{x))/{2y/R{x)r]{x)) follows the same Gaussian distribution. 

3 Conclusions 

With two parameters, one amplitud t] and a phase 6, we study the properties of the roots of 
the functions vr, ii and R, using Equations ^ and Q. With 77, we delimit the differences 
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(y/n — ^/R), {\f[i — y/n) and {Ei — n). Concerning the last one, we know from the data that 
{£i — 71 < ^/n), and in Equation ((7)) we give a more precise relation. We find that cos 5, 
follows a Gaussian distribution, that shows a stable random behavior of the function 7r(x), 
see Figures El and ini Taking different sample sizes, cos 5 distribution remains constant. 
The question is if the Gaussian shape remains constant as x grows. 



Appendix 

To see how the natural numbers accommodate in the different cos 5 intervals, we give as 
an example the first hundred in Table IHl where the prime numbers have been underlined. 

We can see that, each time there is a new prime number, cos 5 increases, and meanwhile 
7r{x) remains constant, until the next prime number, the following integers accommodate 
in intervals with smaller cos 5. So, Table 5 and Figures El and IHl show that the way of 
appearance of the prime numbers implies the randomness of the natural numbers with 
respect to cos 5. In Figurej?! we give a pictorial representation of how the first one hundred 
natural numbers (except 1) are accomodated, where the lines join points with the same 

7l{x) 

That cos 5 decreases each time 7r(x) remains constant, while a new prime number does 
not appear, it is because the function \/R(x) is a monotone growing function. With the 
appearance of the new prime number, cos S increases and the cycle is repeated. The rate 
with which cos 6 decreases is given by its derivative, and as the derivative of R{x) is 



dR _^ _ 1 / 



dx ?7,x("~^)/" Inx Inx V 2x^/^ 3x^/'^ 

then, with the parameterization of Equation ^ the derivative of cos(5(x) for a constant 
nix) is 



/R) lnln(x + 15.5) 



dx 0.2561 + 15.5) ln(a; + 15.5) 2/Rlna; V 2x^2 
while for the parameterization of Equation © is 
dcos5 _ [ln(x + 4.07)]0-« Ip (v^-v^) 1 ,^ 



dx 0.3156 [■ (x + 4.07) ln(x + 4.07) 2/Rlnx V 2x^/2 ••7] 

in both cases the derivative is dominated by the negative term, as it is expected, and 
decreases in absolute value as x grows. In Figures |H1 and (HI some other intervals of 100 
numbers are compared for larger x where it is seen that cos 5 gets more horizontal, this 
is because there is a bigger number of points with the same vr(x), also, although at the 
beginning there are "jumps" when one goes from Ti{p — 1) to 7r(p), whose difference is one, 
as X increases, cos 5 turns into a softer function, because (vr(p) — 7r(p — l))/7r(p) 0. 
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Tables 





(v^-v^) 




{71 -R) 


(r{7i~R) 


2 - 


10^ 


0.001889 


0.062256 


0.033137 


0.403334 


2 - 


10^ 


0.001363 


0.042803 


0.013466 


0.714523 


2 - 


10^ 


0.001302 


0.035624 


0.050812 


1.72635 


2 - 


10^ 


0.001529 


0.031321 


0.25608 


4.23254 


2 - 


10^ 


0.001405 


0.028509 


0.705741 


11.1907 



Table 1: averages (y/n — ^/R) and (tt — i?) in 5 intervals, a is the standard deviation 





(cos5) = (^^) 


a 


(|cos<5|) 




cr 


(|cos5|) 


2 - 


10^ 


0.014402 


0.315325 


0.254145 


-0.010719 


0.315370 


0.253362 


2 - 


10^ 


0.008304 


0.280109 


0.223332 


-0.000662 


0.280161 


0.223363 


2 - 


10^ 


0.009965 


0.283603 


0.224534 


0.006999 


0.282839 


0.224425 


2 - 


10^ 


0.014043 


0.281287 


0.222306 


0.013073 


0.281312 


0.222302 


2 - 


10^ 


0.014057 


0.278975 


0.227005 


0.013740 


0.278979 


0.226989 



Table 2: averages of cos 5 defined by Equation and of cos 5 given by Equation 
where r/i is given by Equation © 





(cos<5) = (^) 


a 


(|cos5|) 


2 - 


10^ 


0.015587 


0.327365 


0.264286 


2 - 


10^ 


0.008606 


0.279154 


0.222093 


2 - 


10^ 


0.009728 


0.274722 


0.217424 


2 - 


10^ 


0.013529 


0.270749 


0.213973 


2 - 


10^ 


0.013608 


0.269512 


0.219318 



Table 3: averages of cos 5 defined in Equation (jSJ, with 772 given in Equation (jnj 
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all 


primes 


even 


odd without 1 


odd 






without 1 






without 2 


and without primes 


without 1 


2 - 


10^ 


0.014402 


0.256581 


(25) 


-0.073010 


-0.056451 (25) 


0.122513 


2 - 


10^ 


0.008304 


0.182444 


(168) 


-0.027533 


-0.025953 (332) 


0.046160 


2 - 


10^ 


0.009965 


0.099122 


(1229) 


-0.002571 


-0.002474 (3771) 


0.022703 


2 - 


10^ 


0.014043 


0.051776 


(9592) 


0.009936 


0.010169 (40408) 


0.018171 


2 - 


10^ 


0.014057 


0.028670 


(78498) 


0.012749 


0.012886 (421502) 


0.015366 



Table 4: average of cos 5, according to the set of positive integers under which the average 
is taken, the numbers in parentheses are the number of positive integers in the given set 



cos 6 


2-10^ 


2-10'' 


2-10^ 


2-10^ 


(-1 _o 95) 


1 


2 


3 


3 


(-0.95,-0.85) 





3 


26 


331 


(-0.85,-0.75) 





14 


120 


2230 


(-0.75,-0.65) 


5 


67 


522 


5024 


(-0.65,-0.55) 


11 


186 


1303 


15247 


(-0.55,-0.45) 


31 


370 


2504 


30391 


(-0.45,-0.35) 


47 


490 


4630 


55051 


(-0.35,-0.25) 


78 


657 


7490 


78559 


(-0.25,-0.15) 


116 


880 


11776 


94341 


(-0.15,-0.05) 


144 


1389 


13740 


114888 


(-0.05,0.05) 


136 


1530 


15040 


138262 


(0.05,0.15) 


130 


1387 


13006 


138171 


(0.15,0.25) 


106 


1038 


10645 


115027 


(0.25,0.35) 


76 


760 


6816 


92749 


(0.35,0.45) 


57 


575 


5082 


68886 


(0.45,0.55) 


36 


390 


3835 


34856 


(0.55,0.65) 


11 


187 


1915 


10700 


(0.65,0.75) 


6 


52 


871 


3833 


(0.75,0.85) 


5 


17 


397 


1086 


(0.85,0.95) 


2 


4 


267 


373 


(0.95,1) 


1 


1 


11 


11 



Table 5: number of positive integers with cos 5 in the intervals 
(-1, -.95), (-0.95, -0.85), ... for differents samples: from (2, 10^) to (2, 10^) 
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COS 6 
-1,-0.95) 
^-0.65,-0.55) 
-0.55,-0.45) 
;-0.45,-0.35) 
-0.35,-0.25) 
-0.25,-0.15) 
-0.15,-0.05) 
;-0.05,0.05) 
(0.05,0.15) 
(0.15,0.25) 
(0.25,0.35) 
(0.35,0.45) 
(0.45,0.55) 
(0.55,0.65) 
(0.65,0.75) 



2 

4,10 

28,36,40,58,96 
16,57,66,95,100 
9,27,35,39,52,70,94,99 
6,12,56,60,65,98 

15, 22, 26, 30, 34, 38, 42, 51, 55, 64, 69, 78, 88, 93 

3, 18, 46, 50, 59, 68, 82, 87, 92, 97 

8, n, 25, 29, 33, 37, 41, 45, 54, 63, 72, 77, 86, 91 

5, 14, 21, 49, 53, 62, 67, 71, 76, 81, 90 

17, 24,32,44,48,75,80,85 

20,61,79, 84,89 

7, 13, 31, 43, 47, 74, 83 

23,73 

19 



Table 6: distribution of the first one liundred natural numbers (without 1) in the different 
intervals of cos 5, the primes are underlined. 



10 



Figures 




10 20 30 40 



Figure 1: (a) A/vr(x) — \/R{x) vs Inx and (b) \ITi{x) — y/Tr{x) vs Inx, in x G (2,10^^), 
where the gross hne is the function \/£i{x) — \/R{x) 



. 2 







-0 



. 2 




In 



Figure 2: -\/vr(^) ~ VR{x) vs Inx, envelopes ri{x) = 0.2595/ In ln(x + 15.9) (continuous 
hne) and t]{x) = 0.315647/ [In (a; + 4.07206)]°-^3°202 (^jashed hne) 
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Figure 3: Vii{x) — y/Tr{x) vs Inx, envelopes rj{x) = 0.2595/lnln(x + 15.9) (continuous line) 
andr/(x) = 0.315647/[ln(x+4.07206)]°-^30202 (dashed line), and the function y/Ii{x)-y/R{x) 
(gross line) 




In X 



Figure 4: ii{x) - 7r{x) vs Inx, in (a) x G (2,10''), (b) x G (5 x 10*^,2 x 10^"^) and (c) 
a; e (2 X 10^^,8 x 10^^), with ri{x) = 0.2595/ In ln(x + 15.9) (continuous line), ri{x) = 
0.315647/[ln(x + 4.07206)]°-^3°202 (hashed line) y ii{x) - R{x) (gross line) 
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Figure 5: distribution of cos6 with ri[x) = 0.2595/ lnln(x + 15.9), where the relative fre- 
quence of cos (5 has been counted in the intervals (—1, 0.95), (—0.95, —0.85) . . ., the Gaussian 
is represented by the continuous line, (a) for the first 10^ natural numbers (except 1) and 
(b) for the first 10^ ones 
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Figure 7: cos5 = (v^(a;) - VR{x))/r]{x) vs x, x E (2, 100), r/(x) = 0.2595/ lnln(x + 15.9) 




Figures: cos5 = (0F(x)-v^(x))/r/(x) vsx, x G (15000, 15100),r/(x) = 0.2595/ lnln(x+ 
15.9) 
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Figure 9: cos 5 = (0F(x) - /R(x))/r/(x) vs x, x G (1000 000,1000100), 7]{x) = 
0.2595/lnln(x + 15.9) 
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